Large Linguistic Corpus Reduction with SCP Algorithms

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Linguistic Corpus Reduction with SCP Algorithms

Linguistic corpus design is a critical concern for building rich annotated corpora useful in different domains of applications. For example, speech technologies such as ASR (Automatic Speech Recognition) or TTS (Text-to-Speech) need a huge amount of speech data to train datadriven models or to produce synthetic speech. Collecting data is always related to costs (recording speech, verifying anno...

متن کامل

Linguistic Corpus Search

Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring posit...

متن کامل

Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks

In this article, we present the RATP-DECODA Corpus which is composed by a set of 67 hours of speech from telephone conversations of a Customer Care Service (CCS). This corpus is already available on line at http://sldr.org/sldr000847/fr in its first version. However, many enhancements have been made in order to allow the development of automatic techniques to transcript conversations and to cap...

متن کامل

Large Sphenoethmoidal Encephalocele Associated with Agenesis of Corpus Callosum and Cleft Palate

Basal encephalocele is a rare craniofacial anomaly. In the present paper we report a 10-year-old boy presented with cleft palate, congenital nystagmus, and hypertelorism. During preoperative evaluation for cleft palate repair, a pulsatile mass was detected in the pharynx. Magnetic resonance imaging showed sphenoethmoidal type of basal encephalocele and agenesis of corpus callosum. Neurosurgical...

متن کامل

Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics

We describe a biographical multidocument summarizer that summarizes information about people described in the news. The summarizer uses corpus statistics along with linguistic knowledge to select and merge descriptions of people from a document collection, removing redundant descriptions. The summarization components have been extensively evaluated for coherence, accuracy, and non-redundancy of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Linguistics

سال: 2015

ISSN: 0891-2017,1530-9312

DOI: 10.1162/coli_a_00225